Pesquisa | Portal Regional da BVS

Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets.

Minotto, Thomas; Robert, Philippe A; Hobæk Haff, Ingrid; Sandve, Geir K.

Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-38563699

RESUMO

Simulation frameworks are useful to stress-test predictive models when data is scarce, or to assert model sensitivity to specific data distributions. Such frameworks often need to recapitulate several layers of data complexity, including emergent properties that arise implicitly from the interaction between simulation components. Antibody-antigen binding is a complex mechanism by which an antibody sequence wraps itself around an antigen with high affinity. In this study, we use a synthetic simulation framework for antibody-antigen folding and binding on a 3D lattice that include full details on the spatial conformation of both molecules. We investigate how emergent properties arise in this framework, in particular the physical proximity of amino acids, their presence on the binding interface, or the binding status of a sequence, and relate that to the individual and pairwise contributions of amino acids in statistical models for binding prediction. We show that weights learnt from a simple logistic regression model align with some but not all features of amino acids involved in the binding, and that predictive sequence binding patterns can be enriched. In particular, main effects correlated with the capacity of a sequence to bind any antigen, while statistical interactions were related to sequence specificity.

Assuntos

Anticorpos , Antifibrinolíticos , Estudos de Viabilidade , Vacinas Sintéticas , Aminoácidos

The fraud loss for selecting the model complexity in fraud detection.

Brant, Simon Boge; Hobæk Haff, Ingrid.

J Appl Stat ; 50(10): 2209-2227, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37434626

RESUMO

Statistical fraud detection consists in making a system that automatically selects a subset of all cases (insurance claims, financial transactions, etc.) that are the most interesting for further investigation. The reason why such a system is needed is that the total number of cases typically is much higher than one realistically could investigate manually and that fraud tends to be quite rare. Further, the investigator is typically limited to controlling a restricted number k of cases, due to limited resources. The most efficient manner of allocating these resources is then to try selecting the k cases with the highest probability of being fraudulent. The prediction model used for this purpose must normally be regularised to avoid overfitting and consequently bad prediction performance. A loss function, denoted the fraud loss, is proposed for selecting the model complexity via a tuning parameter. A simulation study is performed to find the optimal settings for validation. Further, the performance of the proposed procedure is compared to the most relevant competing procedure, based on the area under the receiver operating characteristic curve (AUC), in a set of simulations, as well as on a credit card default dataset. Choosing the complexity of the model by the fraud loss resulted in either comparable or better results in terms of the fraud loss than choosing it according to the AUC.

In silico proof of principle of machine learning-based antibody design at unconstrained scale.

Akbar, Rahmad; Robert, Philippe A; Weber, Cédric R; Widrich, Michael; Frank, Robert; Pavlovic, Milena; Scheffer, Lonneke; Chernigovskaya, Maria; Snapkov, Igor; Slabodkin, Andrei; Mehta, Brij Bhushan; Miho, Enkelejda; Lund-Johansen, Fridtjof; Andersen, Jan Terje; Hochreiter, Sepp; Hobæk Haff, Ingrid; Klambauer, Günter; Sandve, Geir Kjetil; Greiff, Victor.

MAbs ; 14(1): 2031482, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35377271

RESUMO

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.

Assuntos

Reações Antígeno-Anticorpo , Aprendizado de Máquina , Anticorpos Monoclonais/química , Sítios de Ligação de Anticorpos , Epitopos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA